Data smashing: uncovering lurking order in data.

نویسندگان

  • Ishanu Chattopadhyay
  • Hod Lipson
چکیده

From automatic speech recognition to discovering unusual stars, underlying almost all automated discovery tasks is the ability to compare and contrast data streams with each other, to identify connections and spot outliers. Despite the prevalence of data, however, automated methods are not keeping pace. A key bottleneck is that most data comparison algorithms today rely on a human expert to specify what 'features' of the data are relevant for comparison. Here, we propose a new principle for estimating the similarity between the sources of arbitrary data streams, using neither domain knowledge nor learning. We demonstrate the application of this principle to the analysis of data from a number of real-world challenging problems, including the disambiguation of electro-encephalograph patterns pertaining to epileptic seizures, detection of anomalous cardiac activity from heart sound recordings and classification of astronomical objects from raw photometry. In all these cases and without access to any domain knowledge, we demonstrate performance on a par with the accuracy achieved by specialized algorithms and heuristics devised by domain experts. We suggest that data smashing principles may open the door to understanding increasingly complex observations, especially when experts do not know what to look for.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Smashing

Investigation of the underlying physics or biology from empirical data requires a quantifiable notion of similarity when do two observed data sets indicate nearly identical generating processes, and when they do not. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, e:g:, distinct shapes of “folded” lightcurves may be used as “features...

متن کامل

Coordinating computational and visual approaches for interactive feature selection and multivariate clustering

Received: KK Revised: KK Accepted: KK Abstract Unknown (and unexpected) multivariate patterns lurking in high-dimensional datasets are often very hard to find. This paper describes a human-centered exploration environment, which incorporates a coordinated suite of computational and visualization methods to explore high-dimensional data for uncovering patterns in multivariate spaces. Specificall...

متن کامل

Smashing: Folding Space to Tile through Time

Partial differential equation solvers spend most of their computation time performing nearest neighbor (stencil) computations on grids that model spatial domains. Tiling is an effective performance optimization for improving the data locality and enabling course-grain parallelization for such computations. However, when the domains are periodic, tiling through time is not directly applicable du...

متن کامل

LURKING PATHWAY PREDICTION AND PATHWAY ODE MODEL DYNAMIC ANALYSIS A Dissertation by

Signaling pathway analysis is so important to study the causes of diseases and the treatment of drugs. Finding the lurking pathway from ligand to signature is a significant issue in studying the mechanism of how the cell response to the stimulation signal. However, recent literature based pathway analysis methods can only tell about highly differentially expressed pathways related to the experi...

متن کامل

To Sample or To Smash? Estimating reachability in large time-varying graphs

Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the Royal Society, Interface

دوره 11 101  شماره 

صفحات  -

تاریخ انتشار 2014